-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharded fields dump #1738
Sharded fields dump #1738
Conversation
The PR is incomplete (does not include the dft fields) and also has a lot of debugging statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the CI is showing that the C++ unit test added in this PR (dump_load.cpp
) is failing which should probably be fixed first:
============================================
meep 1.21.0-beta: tests/test-suite.log
============================================
# TOTAL: 20
# PASS: 19
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
.. contents:: :depth: 2
FAIL: dump_load
===============
Using MPI version 3.1, 2 processes
Testing 3D dump/load: temp_dir = /tmp/meepQOXddW...
Testing pml quality...
Dumping structure: /tmp/meepQOXddW/test_pml-structure-original
Dumping fields: /tmp/meepQOXddW/test_pml-fields-original
Got newE/oldE of 1.66447e-05
Got newE/oldE of -3.83624e-08
Got newE/oldE of 2.1382e-09
Dumping structure: /tmp/meepQOXddW/test_pml-structure-after-sim
Dumping fields: /tmp/meepQOXddW/test_pml-fields-after_sim
Loading structure: /tmp/meepQOXddW/test_pml-structure-after-sim
Dumping structure: /tmp/meepQOXddW/test_pml-structure-dump-loaded
Loading fields: /tmp/meepQOXddW/test_pml-fields-after_sim
meep: error on line 244 of ../../../src/h5file.cpp: missing dataset in HDF5 file
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
FAIL dump_load (exit status: 1)
tests/dump_load.cpp
Outdated
f1.step(); | ||
if (!compare_point(f, f1, vec(0.5, 0.01, 0.5))) return 0; | ||
if (!compare_point(f, f1, vec(0.46, 0.33, 0.2))) return 0; | ||
if (!compare_point(f, f1, vec(1.0, 0.25, 0.301))) return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This series of unit tests (which seems to be copied from tests/three_d.cpp
) is comparing the fields from two different structures (with and without a chunk splitting) but does not involve anything related to either the fields or structure dump introduced in this PR. Is it necessary?
tests/dump_load.cpp
Outdated
} | ||
|
||
std::string structure_filename_after_sim = | ||
structure_dump(&s, filename_prefix, "after-sim"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most practical applications the structure is time independent and thus the structure object can be dumped to file just once at any point during the simulation.
for (int i = 0; i < num_chunks; i++) { | ||
my_num_chunks += (single_parallel_file || chunks[i]->is_mine()); | ||
} | ||
size_t num_f_size = my_num_chunks * NUM_FIELD_COMPONENTS * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meep uses real fields by default and it is only under certain conditions when it uses complex fields (e.g., a non-zero k_point
, force_complex_fields=True
in the Simulation
constructor, etc.). The hard-coded factor of 2 in this line should therefore be changed to be either 1 or 2 depending on whether the complex field component is a non-NULL pointer or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If !is_real
then the imaginary pointers are NULL and so it should do the right thing, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few lines below you can see that we dump (or load) only does it for the non-null complex field component.
The error reported above for |
I think there is a bug in # Run until t=1 so that fields are initialized.
sim.run(mp.at_every(5, get_field_point), until=1) After removing this line as well as # Tests dumping/loading of fields & structure.
def _load_dump_fields(self, single_parallel_file=True):
resolution = 50
cell = mp.Vector3(5, 5)
sources = mp.Source(src=mp.GaussianSource(1, fwidth=1.0),
center=mp.Vector3(),
component=mp.Ez)
sim1 = mp.Simulation(resolution=resolution,
cell_size=cell,
sources=[sources])
sample_point = mp.Vector3(0.12, -0.29)
ref_field_points = {}
def get_ref_field_point(sim):
p = sim.get_field_point(mp.Ez, sample_point)
ref_field_points[sim.meep_time()] = p.real
# First run until t=25 and save structure/fields
sim1.run(mp.at_every(5, get_ref_field_point), until=25)
dump_dirname = os.path.join(self.temp_dir, 'test_load_dump_fields')
sim1.dump(dump_dirname,
dump_structure=False,
dump_fields=True,
single_parallel_file=single_parallel_file)
# Then continue running another 25 until t=50
sim1.run(mp.at_every(5, get_ref_field_point), until=25)
print('ref_field_points = ' + str(ref_field_points))
# Now create a new simulation and try restoring state.
sim = mp.Simulation(resolution=resolution,
cell_size=cell,
sources=[sources])
sim.init_sim()
field_points = {}
def get_field_point(sim):
p = sim.get_field_point(mp.Ez, sample_point)
field_points[sim.meep_time()] = p.real
# Now load the fields.
sim.load(dump_dirname,
load_structure=False,
load_fields=True,
single_parallel_file=single_parallel_file)
sim.run(mp.at_every(5, get_field_point), until=25)
print('field_points = ' + str(field_points))
for t, v in field_points.items():
self.assertAlmostEqual(ref_field_points[t], v) output
|
In
with
The subtest
The reason for this test failure is simple: there is a Line 1466 in cdae1a8
As a verification that this is indeed the cause of the failing test, replace The slight challenge though for dumping/loading the Lines 1424 to 1429 in cdae1a8
in which the Lines 73 to 75 in cdae1a8
Dispersive materials such as Lines 236 to 240 in cdae1a8
There are other derived classes of |
Codecov Report
@@ Coverage Diff @@
## master #1738 +/- ##
==========================================
+ Coverage 73.46% 74.39% +0.92%
==========================================
Files 13 13
Lines 4557 4581 +24
==========================================
+ Hits 3348 3408 +60
+ Misses 1209 1173 -36
|
The fact that you have to use |
python/tests/test_dump_load.py
Outdated
p = sim.get_field_point(mp.Ez, sample_point) | ||
field_points[sim.meep_time()] = p.real | ||
|
||
sim.init_sim() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this line with sim.init_sim()
causes the subtest test_load_dump_fields
to fail with a segmentation fault after the previously saved fields are loaded from the HDF5 file and sim.run
is invoked.
Here is the output from gdb:
Running test_load_dump_fields
-----------
Initializing structure...
Halving computational cell along direction y
time for choose_chunkdivision = 0.000307183 s
Working in 2D dimensions.
Computational cell is 5 x 5 x 0 with resolution 50
block, center = (0,0,0)
size (1,1,1e+20)
axes (1,0,0), (0,1,0), (0,0,1)
dielectric constant epsilon diagonal = (10.24,10.24,10.24)
block, center = (1,0,0)
size (1,1,1e+20)
axes (1,0,0), (0,1,0), (0,0,1)
dielectric constant epsilon diagonal = (13,13,13)
time for set_epsilon = 0.0874734 s
-----------
run 0 finished at t = 15.0 (1500 timesteps)
creating epsilon from file "/tmp/meepO1ahOS/test_load_dump_fields/structure.h5" (1)...
Dumped structure to file: /tmp/meepO1ahOS/test_load_dump_fields/structure.h5 (True)
creating fields output file "/tmp/meepO1ahOS/test_load_dump_fields/fields.h5" (1)...
Fields State:
a = 50, dt = 0.01
m = 0, beta = 0
t = 1500, phasein_time = 0, is_real = 1
num_chunks = 6 (shared=1)
Dumped fields to file: /tmp/meepO1ahOS/test_load_dump_fields/fields.h5 (True)
run 1 finished at t = 20.0 (2000 timesteps)
-----------
Initializing structure...
Halving computational cell along direction y
time for choose_chunkdivision = 0.000286404 s
Working in 2D dimensions.
Computational cell is 5 x 5 x 0 with resolution 50
time for set_epsilon = 0.0759436 s
-----------
reading epsilon from file "/tmp/meepO1ahOS/test_load_dump_fields/structure.h5" (1)...
Loaded structure from file: /tmp/meepO1ahOS/test_load_dump_fields/structure.h5 (True)
reading fields from file "/tmp/meepO1ahOS/test_load_dump_fields/fields.h5" (1)...
Fields State:
a = 50, dt = 0.01
m = 0, beta = 0
t = 1500, phasein_time = 0, is_real = 0
num_chunks = 6 (shared=1)
Loaded fields from file: /tmp/meepO1ahOS/test_load_dump_fields/fields.h5 (True)
on time step 1500 (time=15), 7.26514 s/step
Thread 1 "python3.5" received signal SIGSEGV, Segmentation fault.
0x00007fffc8e6a8e0 in meep::step_curl_stride1 (f=0x55555681f940, c=meep::Bx, g1=0x0, g2=0x0, s1=0, s2=0, gv=...,
is=..., ie=..., dtdx=-0.5, dsig=meep::Y, sig=0x5555567505b0, kap=0x5555567503e0, siginv=0x555556750210,
fu=0x0, dsigu=meep::NO_DIRECTION, sigu=0x0, kapu=0x0, siginvu=0x0, dt=0.01, cnd=0x0, cndinv=0x0, fcnd=0x0)
at step_generic_stride1.cpp:192
192 f[i] = ((kap[k] - sig[k]) * f[i] - dtdx * (g1[i + s1] - g1[i])) * siginv[k];
The two terms in this update equation which are causing the segfault are g1[i + s1]
and g1[i]
. In _load_dump_fields
, the g1
array corresponds to the Ez
field.
As additional info, the backtrace is:
#0 0x00007fffc8e6a8e0 in meep::step_curl_stride1 (f=0x55555681f940, c=meep::Bx, g1=0x0, g2=0x0, s1=0, s2=0,
gv=..., is=..., ie=..., dtdx=-0.5, dsig=meep::Y, sig=0x5555567505b0, kap=0x5555567503e0,
siginv=0x555556750210, fu=0x0, dsigu=meep::NO_DIRECTION, sigu=0x0, kapu=0x0, siginvu=0x0, dt=0.01, cnd=0x0,
cndinv=0x0, fcnd=0x0) at step_generic_stride1.cpp:192
#1 0x00007fffc8e03647 in meep::fields_chunk::step_db (this=0x5555567c4e10, ft=meep::B_stuff) at step_db.cpp:119
#2 0x00007fffc8e02961 in meep::fields::step_db (this=0x555556757c10, ft=meep::B_stuff) at step_db.cpp:36
#3 0x00007fffc8e0022e in meep::fields::step (this=0x555556757c10) at step.cpp:67
#4 0x00007fffc92ac56c in _wrap_fields_step (args=0x7fffc98d3f98) at meep-python.cxx:82395
I think I know why Line 1985 in f5b60e5
Lines 2445 to 2450 in f5b60e5
Lines 495 to 517 in f5b60e5
Finally, Lines 541 to 561 in f5b60e5
Lines 465 to 493 in f5b60e5
What needs to happen then when dumping and loading the fields is that only those components generated by the particular source need to be saved and allocated. This means that we should save the source component as part of the checkpoint and use it to allocate the correct set of fields when loading the fields similar to how it is already set up. |
* Add support for fully local HDF5 files and shared dumping of meep::structure * Add support for fully local HDF5 files and shared dumping of meep::structure * Update python func docs * Update python API documentation * Dump/Load of 'fields' The PR is incomplete (does not include the dft fields) and also has a lot of debugging statements. * Save dft chunks * Remove debug log stmts * Add saving of time value and also reorg tests. * Fix dft-chunk saving for the single-parallel-file mode. * Abort when trying to dump fields with non-null polarization state * Clean up test_dump_load.py and add test_dump_fails_for_non_null_polarization_state test * Add new tests/dump_load to set of files to ignore * Clean up the test to remove unnecessary stuff from the copied over test * Also dump/load 'f_w_prev' * Fix typo causing build breakage * load fields at the very end of init_sim after add_sources * remove init_sim before loading fields since that now works.
It seems there is a memory leak in both |
Add support for dumping (fields::dump) and loading fields (fields::load). This along with structure::{dump,load} this will allow saving the simulation state at any arbitrary time-step and then restoring from this saved state to continue the simulation from that point.
This PR also adds a basic C++ test dump/load structure and fields. It also moves the existing structure dump/load tests into its own python test (test_dump_load.py) and adds tests for dumping/loading fields also.